Search CORE

52 research outputs found

A Similarity Measure for GPU Kernel Subgraph Matching

Author: A Sabne
BP Miller
C Böhm
F Zhang
G Ammons
L Adhianto
MH Williams
R Lim
R Singh
RC Gonzales
SS Shende
T Ball
Publication venue
Publication date: 21/03/2019
Field of study

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel's CFGs to gain insights to an application's resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel thread divergence characteristics that facilitates end users, autotuners and compilers in generating high performing code

arXiv.org e-Print Archive

Crossref

In-Vitro Anti-Fungal Activity and Phytochemical Screening of Stem Bark Extracts from Ventilago denticulata

Author: Bansode AS
Bhalerao PB
Bhalerao SS
Devhadrao NV
Dhage OL
Shende VS
Tambe AB
Thokal SH
Publication venue: 'Society of Pharmaceutical Tecnocrats'
Publication date: 15/08/2019
Field of study

The objective of the present study was to assess the antifungal activity of pet. Ether extract, acetone extract, ethyl acetate, and ethanol bark extract of Ventilago denticulata (VD).The material was dried in shade made to a coarse powder and weighted quantity of the powder   (1000 g) was subjected to hot percolation in a soxhlet apparatus using petroleum ether, ethyl acetate, acetone and ethanol, at a temperature range of 40-800C. Phytochemical tests were done in presence of phytoconstituents like glycosides, alkaloids, tannins, steroids, flavonoids. The anti-fungal activity was carried out by using cup method using Sabraud’s agar as medium. Plates were incubated at 250C for 42hr and later observed for zones of inhibition. The effect of the extracts on fungal isolates was compared with Griseofluvin at a concentration of 10 mg/ml. The Ethyl acetate extract at low as well as high doses gives antifungal effect. Pet-ether extract, acetone extract and ethanolic extract did not produce any antifungal effect at both doses. Ethyl acetate extract shows zone of inhibition at low dose (T1 10 mg/ml) 10 mm and at high dose (T2  20 mg/ml) 16 mm. Keyword: Ventilago denticulata, Anti- fungal, Griseofluvin

Journal of Drug Delivery and Therapeutics (JDDT)

INTERACTION PARAMETERS OF ALUMINUM BASE BINARY PHASE-DIAGRAMS

Author: BALAKRISHNA SS
MALLIK AK
SHENDE CB
Publication venue: 'Elsevier BV'
Publication date: 01/01/1983
Field of study

Dspace at IIT Bombay

Performance analysis of gpu programming models using the roofline scaling trajectories

Author: A Ilic
A Marowka
L Adhianto
R Xu
S Cook
S Williams
SS Shende
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Performance analysis is a daunting job, especially for the rapid-evolving accelerator technologies. The Roofline Scaling Trajectories technique aims at diagnosing various performance bottlenecks for GPU programming models through the visually intuitive Roofline plots. In this work, we introduce the use of the Roofline Scaling Trajectories to capture major performance bottlenecks on NVIDIA Volta GPU architectures, such as warp efficiency, occupancy, and locality. Using this analysis technique, we explain the performance characteristics of the NAS Parallel Benchmarks (NPB) written with two programming models, CUDA and OpenACC. We present the influence of the programming model on the performance and scaling characteristics. We also leverage the insights of the Roofline Scaling Trajectory analysis to tune some of the NAS Parallel Benchmarks, achieving up to 2

\times

speedup

Crossref

eScholarship - University of California

Accurate and Complete Hardware Profiling for OpenMP

Author: A Drebes
A Muddukrishna
A Pop
L Adhianto
O Pele
SS Shende
Y Rubner
Publication venue: Springer Nature
Publication date: 26/05/2017
Field of study

Crossref

The University of Manchester - Institutional Repository

Overview of Application Instrumentation for Performance Analysis and Tuning

Author: A Haidar
D Barthou
D Terpstra
J Eastep
J Schuchart
J Treibig
M Geimer
M Hähnel
N Nethercote
SL Graham
SS Shende
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Profiling and tuning of parallel applications is an essential part of HPC. Analysis and improvement of the hot spots of an application can be done using one of many available tools, that provides measurement of resources consumption for each instrumented part of the code. Since complex applications show different behavior in each part of the code, it is desired to insert instrumentation to separate these parts. Besides manual instrumentation, some profiling libraries provide different ways of instrumentation. Out of these, the binary patching is the most universal mechanism, that highly improves user-friendliness and robustness of the tool. We provide an overview of the most often used binary patching tools and show a workflow of how to use them to implement a binary instrumentation tool for any profiler or autotuner. We have also evaluated the minimum overhead of the manual and binary instrumentation

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Analysis of the Jobs Resource Utilization on a Production System

Author: A Nataraj
AB Yoo
B Song
C Ernemann
DG Feitelson
L Rudolph
ML Massie
R Jain
SJ Chapin
SS Shende
U Lublin
Y Zhang
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. In HPC community the System Utilization metric enables to determine if the resources of the cluster are efficiently used by the batch scheduler. This metric considers that all the allocated resources (memory, disk, processors, etc) are full-time utilized. To optimize the system performance, we have to consider the effective physical consumption by jobs regarding the resource allocations. This information gives an insight into whether the cluster resources are efficiently used by the jobs. In this work we propose an analysis of production clusters based on the jobs resource utilization. The principle is to collect simultaneously traces from the job scheduler (provided by logs) and jobs resource consumptions. The latter has been realized by developing a job monitoring tool, whose impact on the system has been measured as lightweight (0.35 % speed-down). The key point is to statistically analyze both traces to detect and explain underutilization of the resources. This could enable to detect abnormal behavior, bottlenecks in the cluster leading to a poor scalability, and justifying optimizations such as gang scheduling or besteffort scheduling. This method has been applied to two medium sized production clusters on a period of eight months

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Performance Monitoring and Analysis of Task-Based OpenMP

Author: B Mohr
D Lorenz
D Schmidl
E Ayguadé
K Fürlinger
K Hu
Kai Hu
Kai Wu
M Geimer
Maria Schilstra
SS Shende
TW Curry
Y Ding
Y Lin
Yi Ding
Zhenlong Zhao
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Score-P and OMPT: Navigating the Perils of Callback-Driven Parallel Runtime Introspection

Author: A Knüpfer
A Knüpfer
AE Eichenberger
B Mohr
C Liao
D Lorenz
D Lorenz
I Zhukov
J Protze
JM Bull
K Fürlinger
KA Huck
M Geimer
Matthias S. Müller
P Saviankou
R Schöne
S Benedict
SS Shende
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Event-based performance analysis aims at modeling the behavior of parallel applications through a series of state transitions during execution. Different approaches to obtain such transition points for OpenMP programs include source-level instrumentation (e.g., OPARI) and callback-driven runtime support (e.g., OMPT).In this paper, we revisit a previous evaluation and comparison of OPARI and an LLVM OMPT implementation—now updated to the OpenMP 5.0 specification—in the context of Score-P. We describe the challenges faced while trying to use OMPT as a drop-in replacement for the existing instrumentation-based approach and the changes in event order that could not be avoided. Furthermore, we provide details on Score-P measurements using OPARI and OMPT as event sources with the EPCC and SPEC OpenMP benchmark suites

Crossref

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

Towards an energy-aware scientific I/O interface

Author: A Knüpfer
B Rountree
CH Hsu
J Lofstead
Julian M. Kunkel
M Burtscher
M Geimer
M Gerndt
Michael Kuhn
S Huang
SS Shende
T Minartz
Thomas Ludwig
Timo Minartz
TN Minh
V Freeh
VW Freeh
W Smith
Y Hotta
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref